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ABSTRACT 


Analysts often concern themselves with the tail regions of distributions, sometimes 
called “extreme events,” in order to measure or predict risk. One risk metric, the superquan¬ 
tile, possesses several properties that make it particularly well-suited for risk quantification. 
Observable data, however, often lack information on extreme events due to various resource 
constraints, resulting in sample superquantile estimates that often undervalue the true level 
of risk. By leveraging the dual relationship between superquantiles and superexpectations, 
we apply constrained optimization on second-order epi-splines to arrive at incrementally 
better approximations of superquantile values. With these improved estimates, we incorpo¬ 
rate additional constraints to improve the fidelity of density estimates in tail regions. We 
limit our investigation to data with heavy tails, where risk quantification is typically the 
most difficult. Demonstrations are provided in the form of a known distributional bench¬ 
mark, historical financial data, and a fluid dynamics model used in the development of a 
high-speed naval vessel. Results show that accurate quantile and superquantile constraint 
implementation, in conjunction with empirical statistics and distributional knowledge, can 
improve tail density estimates by up to 15°/c for small samples of various heavy-tailed 
distributions. 
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Executive Summary 


Experimental data collection is often limited in the number of observations due to bud¬ 
getary constraints, computational complexity, or time. In these cases, samples often lack 
the “extreme events” that constitute the tail regions of the underlying probability density 
function. In order to assess and quantify risk, analysts must often extrapolate probability 
estimates into regions beyond the range of observable data. This risk assessment is partic¬ 
ularly important for distributions that exhibit heavy-tailed behavior, where extreme events 
occur more frequently. 

Our goal is to provide a set of accurate risk quantification constraints for limited 
data that enhance the fidelity of probability density estimates in outlying tail regions. Due 
to its desirable qualities as a risk metric, we use the superquantile at a given percentile to 
help inform tail density estimates. By incorporating an accurate superquantile estimate (or 
set of estimates) into a constrained optimization problem using first- and second-order epi- 
splines, we allow for the incorporation of a flexible combination of empirical knowledge and 
distributional soft information that can be uniquely tailored to the application in question. 

This thesis proposes a two-step process that first approximates superquantile val¬ 
ues before using them within an overall density estimation framework. Superquantiles are 
estimated using constrained second-order epi-spline optimization that leverages the dual 
relationship between superquantiles and superexpectations as well as additional soft in¬ 
formation to help inform shape constraints. We investigate the impact of distributional 
knowledge on both quantile and superquantile predictions, demonstrating how enhanced 
constraint formulations arrive at incrementally better estimates. Once obtained, these quan¬ 
tile and superquantile estimates are incorporated into a second constrained optimization 
problem that estimates the underlying density function using first-order epi-splines. Within 
this second phase, we explore optimizations under both maximum l ik elihood and maximum 
entropy formulations. 

The method posited here is evaluated across three cases. In the first case we assess 
the method against two well-defined parametric benchmarks, demonstrating and measuring 
the impact of additional information. Next, we apply the method to a heavy-tailed set of 
financial data, comparing our results with those attained via a widely used existing method. 
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Finally, we investigate a multi-fidelity data set from a fluid dynamics model to assess the 
feasibility of surrogacy for quantile estimates and the benefits of mixture modeling in risk 
quantification. 

Results show that accurate quantile and superquantile predictions are key to suc¬ 
cessful constraint implementation for overall density estimation. Given accurate estimates 
for outlying quantiles and superquantiles, overall density estimation improves by up to 
15%, with tail density estimates improving by up to 80% in the case of maximum entropy 
formulations. The impact here is improved accuracy in tail density estimation, providing 
better risk mitigation planning for “extreme events,” and without the need for large sample 
sizes. 



CHAPTER 1: 
Background 


The world is filled with uncertainty — a lack of total knowledge that forces us to make 
decisions with access to only a fraction of the complete picture. How much money should 
a city allocate in the budget to cover the recovery costs of the next category-five hurricane? 
How confident are engineers that the tensile strength of the steel cables they tested will live 
up to the design specifications for a new bridge? In attempting to answer these questions, 
risk analysts rely heavily on data to provide accurate estimates or bounds on the likelihood 
of an event, and their findings represent a significant and growing area of research in 
numerous fields including insurance, financial management, and reliability engineering. 
The question, however, of how to estimate the probability of events that exist beyond the 
range of observed data requires extrapolation that often leads to inaccurate estimates for 
the probability of extreme events. Within this thesis, we look at a specific risk measure 
(, superquantiles ) to help inform probability density estimates for rare or extreme events 
coming from an unknown distribution with access to limited samples of data. 


1.1 Probability Density Estimation 

Probability density estimation is the construction of probability density functions (PDFs) 
from an observed set (or sets) of data for which the true underlying distribution is unknown. 
By estimating the underlying density of a random variable, we hope to predict future ob¬ 
servations according to some probabilistic model, where predictions become more accurate 
as the fidelity of the model improves. Methods for density estimation include both para¬ 
metric methods such as the maximum likelihood estimation (MLE) formulations proposed 
by Fisher between 1912 and 1922 [1], as well as non-parametric techniques such as kernel 
smoothing [2], support vector machines (SVM) [3], and quasi-MLE for unspecified mod¬ 
els [4]. While parametric techniques can be quite powerful even with limited sample sizes, 
they are highly dependent on the assumption of the density’s underlying parametric struc¬ 
ture. If one assumes an invalid distributional family, or if the data itself is non-parametric 
in nature, these methods can result in grossly inaccurate predictions. 

Nonparametric techniques provide more flexibility in modeling data, but can often 
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rely heavily on sample size to arrive at accurate density estimates. In theory, if one were 
afforded millions of observations, a nonparametric density estimate would perform quite 
well, even for wildly nonparametric distributions. When provided only limited samples, 
however, nonparametric techniques can perform poorly in estimating densities for outlying 
tail regions where observations are likely to be absent. In fact, in a sample of 30 independent 
and identically distributed (IID) observations, there is a roughly 20% chance the data will 
lack an observation coming from the 5% upper tail region, as shown in Figure 1.1. 


Figure 1.1: Upper Tail Data Inclusion by Sample Size 



Sample Size 

Probability of not seeing any values within the 5 % (solid) and 1% (dotted) upper tail regions 
for IID observations. Clearly, sample size is of great consequence in estimating tail density. 

Inaccurate estimates on tail densities become especially problematic when one 
attempts to answer questions regarding the uncertainty or risk of extreme events, such as 
the probability of falling above or below some threshold value. Given realistic constraints 
on resources, analysts are often forced to confront limited data sets possessing few or no 
observations from the extreme tail regions. This shortfall is especially prominent in densities 
where extreme events are more likely to occur, a characteristic referred to as heavy-tailed. 
An example of this is offered in Figure 1.2, which displays a prototypical underestimation 
of tail density from a well-known heavy-tailed distribution. 
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Figure 1.2: Example of Tail Density Underestimation 



PDF estimate for 20 observations coming from a Weibull distribution with shape parameter 
k = 1.5 and scale parameter A = 3. Optimization of first-order epi-splines was performed on 
the basis of maximizing log-likelihood subject to various constraints such as monotonicity 
and tail convexity. Notice that no data exists beyond the 90th quantile. As a result, the 
estimate fails to provide sufficient density in the tail region. 


1.2 Tail Density Characterizations 

The risk implications for tail estimation can depend greatly on the distribution’s tail behavior. 
Take for instance a normal random variable, X ~ N(/u, cr). We know by virtue of its light 
tails that 99.7% of the data is bounded within three standard deviations of the mean. Now 
compare this to a Cauchy random variable (known to have a heavy tail) with an undefined 
variance, and the prospect of bounding the uncertainty becomes much more difficult. In fact, 
even something as simple as estimating the mean for a heavy-tailed distribution can be quite 
difficult, as explained in [5]. In cases where the underlying distribution exhibits behavior 
that can be characterized as heavy-tailed,, the consequences of inaccurate tail estimation 
become of greater consequence. Given our specific interest in such distributions, we limit 
the remainder of our analysis to those distributions and data sets for which we know, or 
suspect, heavy-tailed behavior. 
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1.2.1 Heavy Tails 

Heavy-tailed distributions have tails which are not exponentially bounded, with common ex¬ 
amples including the Pareto, Burr, and Cauchy distributions. Since the tail of an exponential 
distribution is (by definition) exponentially bounded, these can serve as illustrative compar¬ 
isons to demonstrate characteristics of heavy-tails. Defined more explicitly, a distribution 
is characterized as heavy-tailed if 

lim e Ax Pr[X > x\ = oo Vi > 0. (1.1) 

X—>oo 

Heavy-tails can be further defined as one of three general types: fat-tailed, long¬ 
tailed, or sub-exponential. Specialized algorithms for simulating heavy-tailed distributions 
can be found in [6], with more detailed methodologies proposed for handling specific cases. 
Here, we explore the particular characteristics of fat-tailed distributions. 

1.2.2 Fat Tails 

Fat-tailed distributions are a sub-class of heavy-tailed distributions for which the probability 
density function goes to zero as a power of a for large x. As such, they are always 
bounded below by the probability density of an exponential distribution. Mathematically, a 
distribution is said to have a fat tail if 

Pr[X > x] ~ x~ a as x —> oo, a > 0, (1.2) 

where ~ refers to the asymptotic equivalence of functions. Fat tails are frequently used to 
model financial and economic applications where extreme events must be accounted for, 
such as the insurance industry and commodity markets [7]—[9]. Another application is in 
modeling natural disasters [10], where the question posed earlier regarding the category-five 
hurricane serves as an illustrative example. A visual comparison between a fat tail and an 
exponential case is provided in Figure 1.3. 

1.3 Superquantiles as a Risk Measure 

The methods by which we interpret, measure, and communicate uncertainty forms the 
basis for risk quantification. The probability of exceeding a threshold value (as mentioned 
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Figure 1.3: Comparison of Fleavy vs. Exponentially Bounded Tails 




(a) Density Functions (b) Tail Region 

PDF plots with an expanded view of the tail region (left) for an exponential (solid blue) and 
Pareto (dotted red) distribution possessing equal expected values. Notice the lower bounding 
performed by the exponential case in comparison to the fat-tailed Pareto distribution in the 
tail region. 


earlier) is one such risk measure, which in this case uses a particular quantile value as the 
risk measure. Quantiles, however, are just one of many such risk measures that can be 
used. In fact, a quantitative evaluation of risk relies on a measure that properly relates to 
the specific application in question, where certain desirable qualities can make a measure 
more or less useful and appropriate for a given circumstance [11]. Of particular note are 
superquantiles , which possess several features that make them particularly well-suited as 
risk measures. Known more commonly as “conditional value-at-risk,” “average value- 
at-risk,” “tail value-at-risk,” and “expected shortfall” from their applications in financial 
analysis, many analysts have come to recognize superquantiles for their desirable properties 
of coherency and regularity in assessing risk under incomplete or inaccurate probabilistic 
models [11], [12]. The analysis of superquantiles originated in financial engineering where 
risk lies predominantly in the lower tail of the underlying distribution. We modify this 
convention for our purposes here to assess the upper tails instead, realizing that the methods 
are equally valid with the addition of a simple sign change on the data of interest. 

Given a random variable X, we can calculate the quantile of X at a certain percentile 
p as the threshold value below which proportion p of the underlying density function exists. 
If X is continuous, then the quantile function Q(p) is simply the inverse of the cumulative 
distribution function (CDF). A si/per-quantile is (generally) an average of quantiles for 
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percentiles above a given threshold. When the CDF of X is continuous, the superquantile 
becomes the conditional expectation of X above the given quantile threshold value [11]. 
Put another way, a superquantile is the expected outcome provided you are known to be 
operating in the top p-tail of the underlying probability distribution. Visual depictions of 
these relations between superquantiles and the density and quantile functions are provided 
in Figure 1.4. Mathematically, we define the superquantile Qx of the distribution as a 
function of a specific percentile p, where Qx(p ) equals the expectation in the upper p-tail 
distribution of X, for p e (0,1). As such, the superquantile function becomes 

1 

Qx{p) = t - / Qxip'W for pe( 0,1). (1.3) 

1 -Pjp 

Provided the knowledge of a distribution’s superquantile at a given percentile, we 
seek to leverage this information in order to make better estimates of that distribution’s tail 
density. By arriving at better estimates of superquantile values for p in ranges closer to one, 
we hope to more accurately assess the probability of exceeding a particular risk threshold. 


Figure 1.4: Quantile and Superquantile Relationship 




(a) Density Function (b) Quantile Function 

PDF and quantile plots for a Weibull distribution. Quantile and superquantile values are 
identified by the dotted red lines, with the upper tail region shaded. With continuity in the 
density, the superquantile value is the expected value of this upper tail region. 


6 
















1.4 Superexpectations and their Dual 


Related to the superquantile function, the superexpectation function Ex of a random variable 
X is the expected value of the maximum of X and some threshold value x, which we term 
the level. Thus, the value Ex{x) is the superexpectation of X at level x defined by 

E x (x) = E[ max{.r,X}] = [ ma x{x,Q x (p)}dp. (1.4) 

Jo 

If we require E\X\ < oo, then the superexpectation function has the properties of being 
closed, proper, and convex. Using the Legendre-Fenchel transform, we are able to arrive 
at a conjugate (“dual”) function of the superexpectation which is also closed, proper, and 
convex [11]. We refer to this as the dual of superexpectations (DSE) E* x {p), here denoted 
with a star, and defined as a function of p such that 


E* X (P) 


-(1 -p)Qx(p) for p € (0,1), 
- E[X ] for p = 0, 

|0 for p = 1, 


(oo for p i [0,1]. 


(1.5) 


1.4.1 Relation of Dual of Superexpectation to Superquantile 

Of importance is the relationship of the DSE to superquantiles, where one can simply be 
written as a function of the other on the interval p e (0,1), as in Equation 1.5. Furthermore, 
we note the known endpoints for cases where p = 0 and p - 1, as well as the fact that the 
left-derivative of E* x at a percentile p is in fact the underlying density function’s quantile 
value at that same p, as can be seen in Figure 1.5 [11]. That is, 

j p E* x (p) = Q x (p). (1.6) 

Additionally, if X possesses a finite second moment, there exist upper and lower bounds on 
superquantile estimates for values between the endpoints [11]. In this way, for p e [0,1), 
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superquantiles are bounded according to 

E[X] < Q x (p) < E[X] + -^L, (1.7) 

V 1 ~P 

provided that E\X 2 \ < <x>, and where <r(X) is the standard deviation of X. This requirement 
is not extraneous, as several known heavy-tailed distributions, such as the Cauchy distribu¬ 
tion mentioned earlier, do not in fact meet this criterion. Nonetheless, these characteristics 
provide the foundation for estimating the superquantile for a set of data, which underpins 
much of the methodology pursued here and outlined in Chapter 2. 


Figure 1.5: Fleavy-Tailed Quantile Characteristics 




(a) Quantiles (b) DSE 

A comparison of the quantile (left) and DSE (right) functions for the exponential distribution 
(blue solid) and Pareto (dotted red). Notice that the quantile function values correspond 
exactly to the slope of the DSE function. Furthermore, note the bounding provided by the 
exponential case in (b). This exponential upper-bounding is related to the lower-bounding it 
performs on the tail density seen before in Figure 1.4b. 

1.5 First- and Second-Order Epi-Splines 

Epi-splines are a flexible set of piecewise polynomial functions that can be used to describe 
practically any function one could reasonably expect to encounter. By modifying a set of 
constraints that define the problem either through empirical information from the data itself 
or from external soft information one can develop a framework that identifies the coefficient 
values that define the piecewise epi-spline function. Epi-splines are well suited to handling 
multiple shape-constraints and approximating density functions through the maximization 
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of either a likelihood function or entropy function, even for small sets of data, as shown 
in [13]. 


1.5.1 Epi-Splines Model Formulations 

Given a closed interval [/, u\ constituting the bounds of estimation within R, we segment 
the interval into evenly spaced mesh segments. Here, we will distinguish between a 
second-order epi-spline mesh k used for DSE estimation from a first-order mesh m used 
for density estimation. As such, we have K evenly spaced second-order mesh segments 
defining p according to p = {// | k = 1,2,.... K}, and M evenly spaced first-order mesh 
segments defining x according to x = {x m \ m = 1,2,..., M}. Mesh segments are right- 
continuous, with endpoints defined as either left {pi , . 17 J or right (/?/?, xr) respectively, as 
seen in Figure 1.6. Our epi-spline segments are thus defined either as first- or second-order 
polynomials according to their mesh assignment. 

Second Order: E* x (p) = a k } + a\p + a k p 2 for p e [p k L , p k R ) (1.8) 

First Order: f(x) = /?"/ + b'\ l x for jc 6 [x™, v'") (1.9) 


To differentiate notation between formulations, we use a-coefficients for the second- 
order optimization defined via mesh index k , and ^-coefficients for the first-order optimiza¬ 
tion defined via mesh index m. The details of epi-spline implementations are covered in 
Chapter 2. 


1.6 Soft Information Incorporation 

Shortfalls in sample size can often be partially mitigated through the incorporation of soft 
information that provides qualitative information on the underlying density function in ad¬ 
dition to the empirical information provided by the data itself. Examples of soft information 
include density characteristics such as monotonicity, tail convexity, uni-modality, and many 
others. Combinations of soft information through constrained optimization leads to better 
density estimates for limited data sets when implemented in an intelligent framework, as 
shown in [14], [15]. 
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Figure 1.6: First- and Second-Order Epi-Spline Meshes 




(a) First-Order Epi-Splines (b) Second-Order Epi-Spiines 

First- and second-order epi-splines with associated mesh. First-order epi-splines are on the 
left, with second-order epi-splines displayed on the right. Notice the lower semi-continuous 
(lcs) property enforced at mesh intersections. This will require continuity to be enforced 
throughout. By increasing the mesh resolution, we can arrive at arbitrarily close approxima¬ 
tions to virtually any curve using these low-order splines. 


1.7 Overview 

Expanding on earlier work done using epi-splines for density estimation, we plan to im¬ 
plement an additional set of constraints within the constrained optimization formulations 
to incorporate information on tail characteristics and superquantile bounds. In Chapter 2, 
we provide detailed steps for constrained optimization formulation for both superquantile 
estimation and density estimation, identifying objective functions and a set of possible con¬ 
straints used. In Chapter 3, we estimate the superquantile values and density functions 
for two parametric distributions, exploring the impact of the superquantile constraint on 
overall tail density estimation by comparison to known benchmarks. Following this, 
Chapter 4 applies our method to samples from a financial data set known to possess 
heavy-tailed characteristics with comparisons made another commonly used density 
estimation method. Chapter 5 provides an application of the technique on a mixed data 
set of both high- and low-fidelity observations to show the method’s use in a 
hierarchical model framework. Finally, Chapter 6 provides conclusions from our findings. 


10 



CHAPTER 2: 
Methodology 


2.1 Analytic Framework 

We ultimately seek to estimate the underlying distribution of a data set for which limited 
observations exist. Epi-splines, shaped through multiple design constraints informed by both 
the data and additional soft information, and formulated to optimize either log-likelihood 
or entropy, provide the foundation of our approach. In general, the formulation follows the 
process outlined in the flow diagram in Figure 2.1. 


Figure 2.1: Density Estimation Methodology 



The generalized methodology for estimating probability density from a sample of data. Note 
the inclusion of both statistical information derived from the sample itself as well as soft 
information that can describe known tail behavior or other distributional characteristics. We 
hope that the additional steps shown via the dotted lines will help inform tail estimation. 


We begin with an IID set of univariate sample data x\, ...x n for which we wish 
to estimate the underlying density. From the data, we calculate sample statistics (such as 
mean, variance, median, and quartile values), as well as associated confidence intervals as 
per the methods outlined in section A.3 and section A.4. Second-order epi-spline segments 
that approximate the DSE of the underlying distribution are obtained using constrained 
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optimization subject to various constraints as outlined in section 2.3. The choice of second- 
order epi-splines allows us to leverage the rigidly defined shape constraints imposed on 
DSE functions without requiring an ultra-fine mesh. Additionally, as we will later see, the 
objective function for DSE approximation will require epi-splines of at least the second- 
order. From these epi-spline estimates, we acquire quantile and superquantile estimates 
for right-tailed p-values which we will subsequently incorporate into a second constrained 
optimization problem. This second optimization uses first-order epi-splines to approximate 
the underlying density function, using soft information to inform shape characteristics and 
optimized in the form of either a maximum log-likelihood problem (MLP) or a maximum 
entropy problem (MEP) as detailed in section 2.4. We elect first-order splines for this 
second optimization so as to enable a much finer mesh resolution subject to simpler con¬ 
straint formulations. This choice helps to balance computational run time and formulation 
complexity, though higher-order epi-splines could certainly be utilized if desired. 


2.2 Modeling Assumptions 

We assume all data are IID from an unknown, but well-defined, density function with both 
finite mean and variance structure. The distribution need not be parametric, as epi-splines 
are well suited to nonparametric estimation, however, without a finite second moment on 
the underlying distribution, some methods posited here will have an improper application 
due to the attempts to estimate both mean and variance for use in select constraints within 
the optimizations. Furthermore, the data is assumed to possess no time dependence, or 
other ordering structure. 

We further assume the data are free from measurement error or noise, making no 
attempt at deconvolution. If data is known or suspected of possessing Gaussian noise, 
application of a deconvolution constraint within the constrained density optimization can 
be applied as in [14]. 


2.3 Superquantile Estimation Formulation 

We intend to show how the estimation of a desired superquantile can be arrived at from a 
constrained optimization of second order epi-splines. Using known characteristics of the 
DSE and its relation to superquantiles, we conduct our constrained optimization to arrive 
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at estimates of E* x (p). We then convert these to estimates of Qx(p ) via the relationship 
outlined in Section 1.4. 

Given that we know E x {p) exists only for p e [0,1], we are able to freely define the 
length of our second-order epi-spline segments according to a desired mesh resolution. We 
refer back to subsection 1.5.1 in defining our epi-spline segments. 


2.3.1 Dual of Superexpectation Objective 

We desire conservative approximations for right-tailed densities such that our estimates for 
quantiles and superquantiles near p = 1 are at least as high as the actual values. To achieve 
this, we select an objective function that maximizes the curvature of the individual epi- 
spline segments, which we will term K(k), in order to achieve a shape similar to that seen in 
Figure 1.5b. In order to promote curvature over the entire mesh (rather that at a few 
individual segments) we apply an additional smoothing term that penalizes changes to ao 
between consecutive epi-splines. The degree of penalization is governed by a smoothing 
parameter p which we initialize at zero and increase as necessary to achieve smooth DSE 
estimates based on visual observation. Thus, given the second order of our epi-splines and 
our objective of maximizing curvature across the entire mesh, we arrive at an objective 
formulation defined by 


max 

ao, a\, a2 


0 - p)^K(k) 


k 



2a* 

x(k) =- 15 , 

(l + yj a i + 2a i ) 



Objective Function: 
Dual of Superexpectations 

Curvature Function 

Smoother Penalty 


We note here that maximization of the sum of curvatures displayed a reasonable 
shape response for estimating E*(p ), though other objective functions could reasonably be 
utilized. Values for the penalty parameter p ~ 0.01 - 0.1 were shown to work reasonably 
well when smoothing was required. 
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2.3.2 Constraints on E* x (p ) 

We impose constraints according to known criteria as well as external information as 
available. The following examines the significant constraints in formulations used here, 
though this does not constitute an all-inclusive list. 


Convexity, Continuity, and Differentiability 

From Theorem 1 of the characterization of superexpectations [11], we know E* x to be 
convex. We can therefore impose convexity on each epi-spline segment by requiring 

a\ >0 MkeK. (2.1) 

This requirement further ensures that all curvature values will be positive, thus avoiding 
any potential cancellation across the sum of segments. If we further require continuity 
across the segments by tying mesh endpoints together and impose equal slopes at these 
intersections, we enforce convexity across the entire epi-spline function. This also has the 
added benefit of providing a smoother curve. Again, for pr and pi corresponding to the 
right and left ends of the mesh segment accordingly, we arrive at 

a k 0 + a\p k R + a\ ( p k R ) 2 = a k+1 + a k+1 p k L +l + a k+1 ( p k L +1 ) 2 , (2.2) 

a\ + 2 a k p k R = a k+1 + 2 a k+l p k L +l . (2.3) 


Density Continuity 

If we assume the underlying distribution to have a continuous CDF, we can require ao to 
be strictly positive across the mesh. This, again, relies on the DSE-quantile relationship of 
Equation 1.6 so that if the quantile is constant across an interval p, there is a vertical jump 
at that quantile value on the CDF, as demonstrated in Figure 2.2. As such, we can tighten 
our convexity constraint to a strict inequality so that 

a\> 0 MkeK. (2.4) 
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Figure 2.2: Relationship of CDF to Quantile Function 



(a) CDF 



CDF and quantile plots for an arbitrary, but non-continuous distribution. Notice that in areas 
where Qxip) is constant there is a corresponding jump in the CDF that prohibits continuity. 


Endpoints 


From Equation 1.5 we know that 0) = - E[X ] and (1) = 0. Provided we may not 
know E\ X\ exactly, we can approximate it from our sample data as either a point estimate 
or as a confidence interval on the sample mean X. The Lower Confidence Bound (LCB) 
and Upper Confidence Bound (UCB) for X can be determined via a number of statistical 
techniques such as the student’s T -distribution or bootstrap sampling depending on a desired 
confidence level. As such, we can impose a starting point constraint at p = 0 depending on 
our knowledge of X and the degree of flexibility we wish to provide our formulation as one 
of either 


4 = E[X], (2.5) 

Xlcb <4 — Xucb , for k = 1. (2.6) 

Additionally, the end point is always fixed, thus providing a constraint for the right-most 
epi-spline segment as 

4 + 4 + 4 = 0 for k = K. (2.7) 
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Lower Bounds 

As per Equation 1.7, we have a lower bound on E* x {p) according to whether or not we 
have a known mean, known variance (cr(X)), or some combination thereof. We therefore 
impose a lower-bounding constraint as 


cIq + a k p + a k p 2 < (p - 1) 
cIq + a k p + a k p 2 < (p - 1) 


E[X] + 


CTv 


V 1 -pI 


X + 


V 1 -p/ 


( 2 . 8 ) 

(2.9) 


where s refers to the sample standard deviation. If E[X] or <r(X) are unknown, we can 
again estimate their values within some range of a confidence interval. Although there exists 
the corresponding upper bound for E*(p) as well, due to the requirement for convexity, it 
becomes a redundant constraint, and is therefore omitted. 


Quantile Constraints 

We recall from earlier that the slope of the DSE function is in fact the distribution’s quantile 
function as per Equation E6. As such, if we know quantile values for X at specific p, we 
can enforce 

^E* x {p) = a k + 2 a\p = Q x (p) for p e [p k L , p k R \, (2.10) 

within each epi-spline segment containing such p. Furthermore, although mean and standard 
deviation are notoriously difficult to estimate for small, asymmetric distributions, quantiles 
are robust to outliers and can be estimated quite easily via binomial confidence intervals [16]. 
Even in relatively small samples, reasonable intervals for the median, 25th, and 75th 
quantiles (p = 0.50, 0.25, and 0.75 respectively) can be arrived at for modest confidence 
levels. Utilizing this non-parametric technique, we modify our constraint to include quantile 
lower and upper bound estimates Q(p) for any desired p as 

Qlcb(p) < a\ + 2 a k p < Qucb(p) for p 6 [p k L , p k R ]. (2.11) 

For the remainder of this thesis, we use the 25th, 50th, and 75th quantiles to bound epi-spline 
slopes for DSE estimation. An illustration of this constraint, as well as those for the end 
points and lower bounding, can be seen in Figure 2.3. 
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Figure 2.3: Visualization of Constraints for DSE Optimization 


o 
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A graphical depiction of several constraints (green) imposed on the DSE optimization for an 
exponential random sample. The leftmost vertical confidence interval implies the approxi¬ 
mation of expected value, while the three slope constraints seen at p = 0.25, 0.50, and 0.75 
imply unknown but estimated quantile intervals. The lower bound constraint is informed 
through an estimation of both the sample mean and variance as per Equation 2.9. 

Tail Weight 

If the distribution is known to be heavy-tailed, meaning the tail decays sub-exponentially, 
we can apply an additional constraint by leveraging knowledge of the exponential quantile 
function. As identified in subsection 1.4.1, we know that the left derivative of E* x {p) is 
in fact the quantile value for the distribution. For the exponential case, we take the ratio 
of consecutive quantile values to obtain a relative rate of slope change for the epi-spline 
estimate. Heavy-tailed distributions will have a quantile function derivative that is bounded 
above by this change rate within the tail region, as exemplified in Figure 2.4. Thus, if we 
know or suspect heavy-tailed behavior, we enforce 


Q(p k L +1 ) _ log(l -p k R ) < a k + 2a k p k R 
Q{p k L ) log(l -p k L ) ~ a k +2a k p k L 


Vk e Tail Region, 



( 2 . 12 ) 
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Figure 2.4: Relative Gradient Change of Quantile Comparison 



Ratio of quantile values for the exponential (solid blue) and Pareto (dotted red) densities 
possessing equal means. Note the lower bounding performed by the exponential on the 
fat-tailed Pareto, particularly in the tail region right of p « 0.8. 


Minimal and Maximum Values 


Since we know that the distribution’s 0th quantile must be at most as small as the minimum 
observation, and that its 100th quantile must be at least as large as the maximum observation 
in the sample, we can develop constraints on the starting and ending slopes according to 
Equation 1.5, and implemented as 

a\ + 2a\ > max{xj} for k = K, (2.13) 

a\ < min{jc;} for & = 1. (2.14) 

Similarly, if we know the distribution’s minimum or maximum value (or perhaps some 
reasonable bound on it), we can apply these equations as equalities or inequalities on the 
known value in the form of 


= ma x{f(x)} 

for k = K, 

(2.15) 

= min {/(*)} 

for k = 1, 

(2.16) 

< f{x)lJCB 

for k = K, 

(2.17) 

> f(x)LCB 

for k = 1. 

(2.18) 
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Monotonicity 

We leverage the DSE-quantile relation as per Equation 1.6 in order to enforce a characteristic 
of quantile gradients in regions of monotonicity. Within the underlying density, any region 
of monotonic increase or decrease will have a corresponding quantile gradient that is 
negative or positive respectively. Secondly, given the second-order of our epi-splines, we 
know that ^ 

j^E* x = Q' x (p) = 2 a k for p € [p k L , p k R \. (2.19) 

Because we segment our mesh into evenly spaced segments, we combine these two relations 
to form a DSE constraint on ao enforced over regions of known or suspected monotonicity. 
This relationship is depicted in Figure 2.5 and is applied as 


a k > a k+1 Vk G Monotonically Increasing Region, (2.20) 

a k > ci k ~ l Vk G Monotonically Decreasing Region. (2.21) 


2.4 Density Estimation Formulation 

In contrast to the situation with DSE estimation where endpoints for the mesh p were 
predefined on the interval [0, 1], for density estimation we typically will not know the 
endpoints of the underlying distribution beforehand (if they are even finite). As such, our 
mesh x is defined by both a resolution M as well as endpoints l and u, as formulated in 
subsection 1.5.1. 


2.4.1 Density Objectives 

Given our choice of objective function as outlined earlier, the general formulations for 
density become 

n 


MLP: 

MEP: 


max 

f 


2]log (/(*/)) 

i =1 

S.t. f G F, 

(2.22) 

J -/(*)log (/(*)) dx 

S.t. f G F, 

(2.23) 
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Figure 2.5: Quantile Gradient Relationship to Monotonicity 



2a!)' 1 2a!) 2a!) +1 

The Weibull distribution from before with several quantile values plotted. Since quantiles are 
shown for evenly spaced p segments (0.50, 0.65,...) the corresponding areas under the curve 
between them are equal (in this case 0.15). Since the PDF is monotonically decreasing in this 
region, basic geometry tells us that the distances between our quantiles must increase over 
the same region. This increase directly corresponds to our change in a , across our epi-spline 
mesh. 


where the set of constraints F are determined by properties of density functions (non¬ 
negative, integrate to one), as well as soft information such as continuity and shape con¬ 
straints. With first-order epi-splines, we optimize our segment coefficients according to one 
of either 


MLP: 

n 

- max V log(/?jj' + b" l Xj) Vi, m \ Xj e m, 

bo, b\ 

1 = 1 

(2.24) 

MEP: 

= max [-(b™ + b™x)\og(b'* + b" l , x)dx 
bo, b\ J 



= maxV z^log(z^) + z^log(z^) + 2(z^ + z”)log 1 — I . 

\ 2 /J 

(2.25) 


Here we use z 1 ^ = b™ + b r "x" f ^ and z'j' = b™ + Z/” x™ as substitution variables, and make 
use of Simpson’s Rule for approximating the integration contained in the MER Given 
that Simpson’s Rule provides exact results for polynomials of degree three or lower, we 
implement it in later quantile and superquantile constraints to simplify terms. 
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2.4.2 Constraints on Density 

We provide the minimal set of constraints that define a valid PDF, as well as those additional 
constraints dealing with quantile and superquantile estimates. Many other shape constraints, 
or soft information, such as monotonicity, unimodality, and convexity can be incorporated 
depending on one’s knowledge of the underlying distribution. These methods are utilized 
throughout the remainder of this study, though the exact constraint derivations are omitted 
for the sake of brevity. Many of these derivations can be found in [14]. 


Unity 

By definition, the density function must integrate to one. We can formulate this integrality 
constraint in linear form, where A v is the mesh resolution (u - 1)1 M, shown here as 



(2.26) 


Continuous 

We further assume densities explored here come from the family of continuous functions 
so that 


h m 

°o 


+ b™; 


’777 + 1 


yn+1. 


+ b'r' ~x 


m +1 
L 


im I m < M. 


(2.27) 


Non-Negative 

Finally by definition of a density function, our estimates must contain all non-negative 
values such that 


b™ + b™x m L > 0 im g M, 
b™ + b'l'x'W > 0 im g M. 


(2.28) 

(2.29) 


Quantile Constraints 

If we know (or believe we have accurately estimated) the value of a particular quantile, we 
can, in much the same way that we formulate the unity constraint (Equation 2.26), require 
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that some percentage of density occupy the region left or right of the value itself. This works 
for not just quantiles estimated from the data itself, which we used to inform Equation 2.11, 
but also for quantile values closer to p = 0 or p = 1 by estimating the slope of the DSE 
function. In either case, quantile constraints for a specific p can be implemented as 



m\x™>Q(p) 


h m 

°o 


+ b" 


+ x. 


~ P- 


(2.30) 


Superquantile Constraints 

We wish to incorporate a constraint utilizing our estimate of the superquantile(s) derived 
from the previous optimization. Using the definition of superquantile outlined in Equa¬ 
tion 1.3, and leveraging Simpson’s Rule to simplify the expression, we arrive at a superquan¬ 
tile constraint for a particular quantile p as 

7 —— V ^ h( X ) = Q(p ) (2.31) 

1 - p t—L 6 

m\x™>Q(p) 

h(x) = 3 b'"{x"l + x m R ) + b\ l ((.v " ! ) 2 + (.v™) 2 ) + b?(x™ + x'D 2 (2.32) 

which remains convex in bo and b\ . The corresponding quantile value Q(p) can be estimated 
if unknown. One may also recall that since the superquantile at p = 0 is in fact the expected 
value, we see that the above formulation doubles as an expected value constraint for m 
summed across the entire mesh. 

2.4.3 Quantile/Superquantile Constraint Implementation 

In subsection 2.4.2, we formulated our quantile and superquantile constraints as equalities. 
Given the approximate nature of our quantile/superquantile values from DSE estimation, in 
practice we implement our constraints as either lower or upper bounds, the choice of which 
depends on the objective function for density (MLP vs. MEP) as well as any assumption 
made regarding a potential bias in estimates. 

Typically, MLP formulations utilize lower bounding so as to “push” density into 
the tails, while MEP formulations use upper bounding to “rein in” density from the tails. 
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Conversely, if one believes the values attained though DSE approximation to overestimate 
the true superquantile values then perhaps the implementation of superquantile constraint 
as an upper bound may be more appropriate. When not otherwise stated, we assume that all 
MLP formulations incorporate expected value, quantile, and superquantile constraints via a 
lower-bounding of right-tailed densities, while MEP formulations use an upper-bounding of 
right tailed densities. With this in mind, we present the following quantile and superquantile 
constraints, where both Q(p) and Q(p) are the results of DSE estimation in the previous 
step for quantiles and superquantiles respectively. 



m\x™>Q(p ) 



m\x™>Q{p) 


b™ + V? 


y m , m 

a r f a l 


Y m , r m 
b m + b m I R ■ L 


< 1 — p 

> 1 — p 


(Quantile Upper Bound) 


(Quantile Lower Bound) 


(2.33) 

(2.34) 


j-— V ^~h(x) < Q(p) 

1 - p 4—7 6 

m\x”>Q(p) 

rr— V, ^rHx) > Q(p) 

1 - p t—L 6 


in\x'">Q(p) 


(Superquantile Upper Bound) (2.35) 


(Superquantile Lower Bound) (2.36) 


Given the possibility of inaccurate and perhaps even infeasible quantile and su¬ 
perquantile estimates, we further introduce a highly-penalized set of slack variables by 
which we elasticize the constraint bounds to ensure feasibility in the solution space. In 
nearly all cases, these slack variables remain zero. 


2.5 Approximation Metrics 

In order to assess the validity of our method, we must quantify the error of our estimates 
numerically. For evaluation of the quantile and superquantile estimates, we use the average 
absolute deviation (AAD) and the median absolute deviation (MAD) between the estimated 
and known values at various p across all optimization iterations (j 6 /). Here we evaluate 
our DSE estimation results at p = 0.80, 0.90, 0.95, and 0.99. Since we are dealing with 
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heavy-tailed densities, we prefer two centrality measures so as to assess the impact of 
potential outlier skewing. Thus, we formulate our DSE error metrics as 


AAD(p) = | Q(p) - Q(p) 

j 


MAD(p) = median j 


Q(P ) - Q(p) 


with quantile errors calculated in the same manner. 


For density estimation, we also propose two metrics. The first is a measure of overall 
fit, where we sum the squared errors (SSE) at epi-spline segment endpoints across the entire 
mesh. The second metric uses the same measure, but only across endpoints within the tail 
region. As such, we term this the sum of squared tail errors, or SSTE, and use the 80th, 
90th, and 95th quantiles to define the start of right tail regions according to 

SSE = J](f(x)-f(x)) 2 , 

m 

SSTE - Yj (/(*) - / W) 2 • 

m\x™>Q(p) 
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CHAPTER 3: 
Distributional Benchmarks 


In order to evaluate and quantify the impact of accurate superquantile estimation within 
the density estimation framework, we evaluate the methodology posited in Chapter 2 by 
considering samples from known distributional benchmarks. We explore two distributions, 
the exponential and the Pareto, making use of their well-defined density and quantile 
functions in order to measure estimation errors in quantiles, superquantiles, overall density, 
and tail density. 


3.1 Exponential Superquantile Estimation 

We begin with the exponential case, where a comparison is relevant because an exponential 
decay in the tail density serves as the boundary case for classification of heavy tails. Thus, 
this exploration serves as a validation for such data that might just barely be considered 
heavy-tailed. We begin by taking a sample of 30 IID observations, randomly generated 
from an exponential distribution with a rate of one-fifth. 

xi, X 2 ,*30 ~ fx = Ae~' lx , for A = ^ 

Given knowledge of the underlying distribution, we can compare the results attained 
via epi-spline estimates with the true values for quantiles and superquantiles at specific 
p-valucs. From the PDF, we derive the following quantile function Qx(p), superquantile 
function Qx(p), and DSE function E* x {p) for the exponential case. Probability density, 
quantile, and DSE plots for the exponential case can be seen in Figure 1.3 and Figure 1.5. 

Qx(p) = “ln(l ~P) (3-1) 

Qx(p) = [1 - In (1 - p)] (3.2) 

E* X (P) = (p-l)Qx(p) = ^ [1 -ln(l -p)] (3.3) 
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With our random sample, we apply different combinations of constraints that con¬ 
stitute varying degrees of distributional knowledge and soft information by way of four 
progressive scenarios. As the scenarios progress, we gradually increase distributional 
knowledge in a manner that seems reasonable. Scenario 1 serves as a base case, with no 
prior knowledge outside of what we can deduce from the sample itself. Using bootstrapping, 
we obtain 95% confidence intervals for the mean and standard deviation (as per section A.3), 
as well as 95% confidence intervals for the median and quartiles using binomial approx¬ 
imations (as per section A.4). In Scenario 2, we acquire knowledge of the distribution’s 
median as well as the “heavy-tailed” characteristic. In Scenario 3 we learn its median 
and quartile values, as well as the fact that X cannot be negative (i.e., a minimum value 
threshold). Finally, given the difficulty in their estimation, we wait until Scenario 4 to add 
perfect knowledge of the distribution’s mean and variance, which are used to inform both 
the starting conditions and the lower bounding constraint. We also add the knowledge that 
the functions is monotonically decreasing. These constraint formulations are summarized 
with their associated equations in Table 3.1. 


Scenario 

Constraints 

Equations 

DSE 1 

Continuous 

Equation 2.2 


Differentiable 

Equation 2.3 


Convex 

Equation 2.1 


Endpoint 

Equation 2.7 


Startpoint 

Equation 2.6 


Lower Bound 

Equation 2.9 


Quartiles 

Equation 2.11 


Min. Value 

Equation 2.14 


Max. Value 

Equation 2.13 

DSE 2 

+ Median Known 

Equation 2.10 (p = 0.5) 


+ Heavy-Tailed 

Equation 2.12 

DSE 3 

+ Quartiles Known 

Equation 2.10 (p = 0.25, 0.75) 


+ Minimum Value 

Equation 2.18 (x > 0) 

DSE 4 

+ Mean Known 

Equation 2.5 


+ Variance Known 

Equation 2.8 


+ Monotonic Decrease 

Equation 2.21 


Table 3.1: Constraint Configuration Scenarios for Estimating DSE 


For each scenario, 100 iterations of the optimization are performed, with each 
iteration using a different randomly generated sample of 30 observations and a mesh of 
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K = 20 segments. The results of these scenario replications are summarized in Table 3.2. 
For the objective function, we begin with p = 0, incrementally increasing the penalty term 
if results are determined to require additional smoothing. For the exponential case, no 
smoothing was required. 


Quantiles 

<2x(0.80) 

<2x(0.90) 

<2x(0.95) 

Qx( 0-99) 

Scenario 1 

1.129 

2.397 

4.637 

9.426 

Scenario 2 

2.659 

4.000 

2.604 

5.425 

Scenario 3 

0.943 

1.688 

2.334 

7.038 

Scenario 4 

0.313 

0.318 

0.271 

1.771 

Superquantiles 

<2x(0.80) 

<2x(0.90) 

<2x(0.95) 

Qx( 0-99) 

Scenario 1 

3.496 

5.586 

7.623 

7.190 

Scenario 2 

3.529 

3.791 

4.261 

4.152 

Scenario 3 

2.524 

3.718 

5.287 

4.919 

Scenario 4 

0.096 

0.473 

1.037 

2.514 


Table 3.2: AAD Estimation Errors for 100 Iterations on Exponential Benchmark 


We see from Table 3.2 that as the degree of distributional knowledge increases, 
the accuracy of both quantile and superquantile estimates generally improves. Somewhat 
intuitively, estimates further in the tail (closer to p = 1) are less accurate than those nearer 
the center. Additionally, we find a significant jump in accuracy for Scenario 4, suggesting 
that knowledge of the mean and variance is valuable in both quantile and superquantile 
estimation. A plot of the 100 DSE epi-spline estimates and their associated superquantile 
curves for Scenario 2 is provided in Figure 3.1. Additional scenario graphics are included 
in Section A.6. 


3.2 Exponential Density Estimation 

With the estimates of Qx and Qx for p = 0.80, 0.90, 0.95, and 0.99, we now perform 
a second constrained optimization to estimate the probability density of the underlying 
distribution. As before, we constrain the optimizations to correspond to the same levels 
of distributional knowledge as in the DSE scenarios. For example, Scenario 3 here would 
assume perfect knowledge of median and quartiles in the same manner as Scenario 3 
during DSE estimation. In this way, the scenarios for density estimation represent a logical 
progression within the overall methodological framework. 
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Figure 3.1: DSE and Superquantile Estimates for Exponential Scenario 2 


25 

20 

15 

10 


Dual of superexpectations (left) and superquantile (right) second-order epi-spline estimates 
for the exponential case using constraints via Scenario 2 and optimized for curvature. The true 
values are shown in solid red. Notice the generally conservative estimates made, regularly 
overestimating the superquantiles in the range near p - 1. 

For each scenario we optimize on the basis of both MLP and MEP. In addition, we 
run each optimization twice. In the first iteration, we ignore the quantile and superquantile 
estimates derived in the previous section. In the second iteration we apply these estimates 
for p = 0.80, 0.90, 0.95, and 0.99 so as to assess the impact of the previous step. This 
second iteration is denoted by a star (“*”) in the scenario name, indicating that it uses 
E* x (p) optimization to obtain additional constraints in density estimation. In total, this 
requires 16 formulations, each with 100 iterations using different sample data, for 1,600 
total optimizations. We recall that constraints for mean, quantile, and superquantiles are 
bounded as per subsection 2.4.3 with MLP utilizing lower bounding and MEP utilizing 
upper bounding. 

Due to potential inaccuracies in sample estimates and quantile/superquantile esti¬ 
mates from DSE optimization, for any constraint reliant upon “less than perfect” knowledge 
we incorporate a highly penalized slack variable so as to ensure feasibility throughout the 
100 iterations. Finally, soft information is applied consistently throughout scenarios as per 
Table 3.3. 

The results of our 16 scenarios are depicted in Figure 3.2, which shows the SSE 
across the entire mesh for both MLP and MEP formulations and averaged across the 100 



(b) Scenario 2 Superquantile Estimates 



0.0 0.2 0.4 0.6 0.8 1.0 

P 


(a) Scenario 2 DSE Estimates 
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Soft Information for Exponential PDF Estimation 
Unity, Continuous, Non-Negative 
Lower/Upper Limits: [0, 60] 

Monotonic Decrease: x > 0 
Convex Right Tail: x > 0 
Max. Gradient Change: Ab\ < 0.01 

Table 3.3: Soft Information for Exponential Density Estimation 


Figure 3.2: SSE Results for Exponential Benchmark 



Scenario 

Sum of Squared Errors (SSE) for first-order epi-spline estimates evaluated across all mesh 
segments for both MLP and MEP formulations and averaged across 100 iterations. Results 
show an improvement as distributional knowledge increases, and when quantile/superquantile 
estimates are incorporated. 


iterations. We note two general trends. First, as scenarios increase in their level of 
distributional knowledge there is a reduction in overall density error, the degree to which 
seems to depend on objective formulation. Second, the inclusion of estimated quantile and 
superquantile values causes a noticeable decrease in overall error, especially in early MEP 
scenarios. A more detailed numerical summary of these results is included in Table A.l. 

The results of density estimation across the 100 iterations for Scenario la* is 
provided in Figure 3.3. We include here a second plot on a log-scale to better visualize the 
tail region estimates. The plots of additional scenarios are provided in section A.6. 
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Figure 3.3: PDF Estimates for Exponential Scenario la* 




(a) Scenario la* PDF (b) Scenario la* PDF Log-Scale 

Density estimates (left), also shown on a log scale (right), for second-order epi-spline estimates 
for Scenario la* of the exponential benchmark. The true values are shown in solid red. 


3.3 Pareto Superquantile Estimation 


For our second benchmark, we explore a distribution known to possess a heavy tail, the 
Pareto. Using the same methodology as before, we begin with a sample of 30 observations 
taken from a Pareto distribution with shape parameter (a) of three and a scale parameter 
(x m ) of ten. 

X\, X 2 ,.... X 30 ~ fx = 7 - for a = 3, x m = 10 

(x + x m ) a 


From the probability density function, we attain the following quantile function 
Qx(p), superquantile function Qx(p), and DSE function E* x {p ) for the Pareto case. We 
again reference Figure 1.3 and Figure 1.5 of Chapter 1 for plots of probability density, 
quantile, and DSE functions. 


Qx(p) = 


(l -p) l i° 
CXX m 


(3.4) 


Qx(p) = 


(a - 1)(1 - p) 1 /® 


- x 


(3.5) 











3.3.1 Superquantile Estimation 

Using the same constraint scenarios as outlined for the exponential case, we arrive at the 
following results summarized in Table 3.4. We first notice the considerable increase in 
quantile and superquantile estimation errors, particularly for higher /?-values. Still though, 
we continue to see the general trend of decreasing errors as the scenarios progress. Scenario 
2 epi-spline estimates are provided in Figure 3.4, where we see similar results to those of 
the exponential case. 


Figure 3.4: DSE and Superquantile Estimates for Pareto Scenario 2 




(a) Scenario 2 DSE Estimates (b) Scenario 2 Superquantile Estimates 

Dual of superexpectations (left) and superquantile (right) second-order epi-spline estimates 
for Scenarios 2 for the Pareto case. The true values are shown in solid red. 


Quantiles 

<2x(0.80) 

<2x(0.90) 

<2x(0.95) 

<2x(0.99) 

Scenario 1 

2.107 

4.952 

9.347 

13.474 

Scenario 2 

1.968 

2.184 

2.181 

11.176 

Scenario 3 

0.882 

0.801 

2.405 

13.032 

Scenario 4 

1.162 

1.064 

2.091 

5.929 

Superquantiles 

<2x(0.80) 

<2x(0.90) 

<2x(0.95) 

<2x(0.99) 

Scenario 1 

6.229 

9.222 

11.575 

22.412 

Scenario 2 

2.768 

4.873 

8.577 

27.326 

Scenario 3 

2.841 

5.436 

9.743 

28.872 

Scenario 4 

0.378 

1.866 

4.531 

22.092 


Table 3.4: AAD Estimation Errors for 100 Iterations on Pareto Benchmark 
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3.4 Pareto Density Estimation 

We again run our optimizations across the 16 formulations, plotting density estimates and 
recording errors for each of the 100 iterations. We modify our mesh, expanding it from 
[0, 60] (as in the exponential case) to [0, 110] so as to encompass all X observations. All 
other soft information remains as before. The numerical results of these error calculations 
is provided in Section A.6 and plotted in Figure 3.5. Scenario la* again serves as an 
illustrative case for optimization on the basis of log-likelihood with minimal information, 
where we again include a log-scale plot to highlight estimation in the tail region (Figure 3.6). 

Figure 3.5: SSE Results for Pareto Benchmark 


MEP MLP 



1 2 3 4 1 2 3 4 

Scenario 


Sum of Squared Errors (SSE) for first-order epi-spline estimates evaluated across all mesh 
segments for both MLP and MEP formulations and averaged across 100 iterations. In contrast 
to the exponential case, the addition of additional quantile/superquantile constraints doesn’t 
appear to help as much in overall density estimation, at least from a quantitative perspective. 


3.5 Benchmark Summary 

The exponential and Pareto cases provided somewhat mixed results. On one hand, the 
exponential benchmark seems to suggest that the addition of quantile and superquantile 
constraints informed though DSE approximation can be quite beneficial, particularly for 
MEP formulations that have access to very little distributional knowledge (as in Scenarios 
1 or 2). Contrast this to the Pareto cases, where the addition of these same constraints 
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Figure 3.6: PDF Estimates for Pareto Scenario la* 




(a) Scenario la* PDF (b) Scenario la* PDF Log-Scale 

Density estimates (left), also shown on a log scale (right), for second-order epi-spline estimates 
for Scenario la of the Pareto benchmark across 100 iterations. The true values are shown in 
solid red. 

seemed to have almost no impact. This, however, might be slightly misleading. Recall that 
the error of quantile/superquantile estimates for the Pareto case was higher than that of the 
exponential. If predicted values possess significant error, then it follows that their inclusion 
in an density estimation framework could actually be detrimental rather than helpful. For 
this reason, we do not rely purely on a numerical evaluation of the technique’s efficacy, but 
also include plots displaying the fidelity of the estimates themselves for visual comparison. 
Again, these can be found in Section A.6. 


3.5.1 Hypothetical Constraint Implementation 

One trend that remains consistent, however, is the improvement in accuracy as we move from 
Scenario 1 to 4. The addition of new information into the density estimation framework 
improves overall estimation considerably, with perhaps the most noticeable improvement 
occurring between Scenarios 2 and 3, as seen in Figure 3.2 and Figure 3.5. In light of this 
observation, we feel confident that quantile and superquantile constraint incorporation can 
improve density estimates provided the constraints are near the true quantile/superquantile 
values. To assess this, we provide a fifth scenario, showing MLP and MEP formulations 
utilizing a perfect knowledge of the quantile/superquantile constraints used in Scenario 1 
and without any other additional information. We refer to these scenarios as run “P,” with 
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results depicted in Figure 3.7 and summarized in Table 3.5. 


Figure 3.7: PDF Estimates with Perfect Quantile Knowledge 




(a) Exponential MLP (b) Exponential MLP on Log-Scale 




(c) Pareto MEP (d) Pareto MEP on Log-Scale 

Scenario 1 runs for both the Exponential case (MLP) and Pareto case (MEP) given a hy¬ 
pothetical perfect knowledge of quantile/superquantile values for p = 0.80, 0.90, 0.95 and 
0.99. Compare these to the results shown earlier and it becomes apparent that accurate 
quantile/superquantile estimates can greatly enhance density optimization. 

Although improvement is not made in every case, we generally find that MEP 
formulations are greatly improved by adding accurate quantile/superquantile constraints. 
Aggregating the errors across both cases and formulations we calculate an overall improve¬ 
ment in SSE of roughly 25% between Scenarios 1 and P, and an overall improvement in 
SSTE (averaged across p = 0.80, 0.90, 0.95) of 15% for MLP and greater than 80% for 
MEP. 
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Scenario 

SSE 

SSTE (0.80) 

SSTE (0.90) 

SSTE (0.95) 

Exp la 

0.0127 

2.11e-03 

l.lle-03 

5.43e-04 

Exp la* 

0.0118 

1.52e-03 

8.36e-04 

4.59e-04 

Exp Pa 

0.0119 

1.70e-03 

9.52e-04 

4.82e-04 

Exp lb 

0.0283 

2.94e-03 

2.66e-03 

2.06e-03 

Exp lb* 

0.0079 

0.55e-03 

2.46e-04 

1.71e-04 

Exp Pb 

0.0064 

0.28e-03 

8.64e-06 

1.96e-06 

Par la 

0.0253 

1.85e-03 

6.45e-04 

1.85e-04 

Par la* 

0.0251 

1.39e-03 

5.20e-04 

2.16e-04 

Par Pa 

0.0243 

1.23e-03 

5.28e-04 

2.47e-04 

Par lb 

0.0183 

0.84e-03 

3.38e-04 

8.03e-05 

Par lb* 

0.0168 

0.75e-03 

2.92e-04 

7.80e-05 

Par Pb 

0.0164 

0.21e-03 

3.54e-05 

l.lle-05 


Table 3.5: Impact of Perfect Quantile Knowledge on Error 


3.5.2 Observations 

The use of second-order epi-splines provides a relatively simple way to leverage a curvature 
objective function while requiring relatively few mesh segments (only 20 here). While this 
eases computation, it also limits the degrees of freedom afforded the optimization solver 
and as such, limits the flexibility of the epi-spline estimates. This issue came to light in 
trying to impose multiple simultaneous constraints that exhausted the degrees of freedom 
provided by second-order splines, resulting in infeasibility issues. 

One could of course simply increase the complexity of the epi-spline model through 
enhancing the degree of the polynomial. This would enable a more flexible fit that could 
accommodate more elaborate constraint combinations at the expense of increased compu¬ 
tational cost. Alternatively, one could also simply add slack variables to those constraints 
deemed desirable, though not necessarily required. This approach would have the added 
benefit of providing the marginal costs of taught constraints, which provides the intuitive 
appeal of relating to a confidence in imposing the constraint. As such, that was the method 
pursued here. 
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CHAPTER 4: 

Non-Parametric Financial Data 


Having observed the impact of DSE estimation and the relative merit of the method on 
known distributional benchmarks, we now turn our focus to the financial sector, where data 
sets are known to commonly exhibit heavy-tailed characteristics. We obtain this data from 
Dr. Uryasev, a professor and Director of the Risk Management and Financial Engineering 
Lab at the University of Florida. The data includes 1,000 observations of both positive and 
negative values of a particular financial measure, which we will term “cost.” 

4.1 Data Evaluation 

We begin with a summary of the raw data Y, shown in Table 4.1, as well as a PDF estimate 
derived using the density function from the R base-package stats, which we will term d{Y), 
across all 1,000 observations. Here, the density is estimated using kernel smoothing as 
described in [17]. Many other density estimation packages are available, with a comparison 
of efficiency and accuracy explored in [18]. 


Statistic 

Minimum 

1st Quartile 

Median 

Mean 

3rd Quartile 

Maximum 

Y 

-603.40 

-127.90 

-10.33 

7.44 

118.80 

1095.00 

d{Y) 

-728.30 

-129.89 

-11.71 

7.44 

125.51 

1219.40 


Table 4.1: Summary Statistics of Financial Data 


We assume the data to have come from some continuous, but unknown distribution. 
With 1,000 observations, the PDF estimate arrived at by d(Y) provides us with a qualitatively 
acceptable approximation to the true distribution within the range of observed values. 
We realize, however, that d{Y) is not “the true” distribution, and that its approximations 
(particularly in the extreme tail regions) will invariably be flawed. Without the underlying 
ground-truth, a true numerical evaluation of results (as in Chapter 3) remains ill-posed. As 
such, we focus the majority of this chapter on a qualitative evaluation of density, avoiding 
quantitative comparisons of tail regions. 

Inspecting the data, we suspect (as per Figure 4. la) that the right tail of the data may 
be heavy-tailed. In fact, if we look only at the density to the right of the mode (which in this 
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case is roughly zero) and rescale accordingly, we see that by comparison with an exponential 
(Figure 4. lb), the data does appear to possess a heavy tail. Though in reality an investigation 
such as this would not be possible (as we are using all 1,000 observations), we include it 
here to show that our data can reasonably be considered heavy-tailed, a specification we 
earlier cited in the outline for this approach. 

Figure 4.1: Financial Data Density Analysis 




(a) Finance Data PDF (b) Finance Data Tail Region 

The PDF (left) and tail density (right) for the financial data. The tail density here is compared 
with an exponential distribution using MLE to estimate the rate parameter and plotted on a 
log-scale. We see evidence here that the data can be considered heavy-tailed. 


4.2 Quantile and Superquantile Estimations 

We intend to evaluate the efficacy of our approach on small data sets for which limited 
knowledge is available. As such, we depart from the scenarios outlined in Chapter 3 to rely 
solely on information derived from the sample data itself. 

We begin by taking 30 randomly sampled observations (without replacement) from 
the original data and calculating summary statistics in the same manner as Chapter 3. Doing 
this over 100 replications, we apply constraint equations that closely relate to Scenario 1 
from earlier, and outlined in Table 4.2. We again optimize for curvature, in this case 
utilizing a smoothing parameter of p - 0.01. The results of the 100 replications for 
quantile/superquantile predictions are summarized in Table 4.3, along with their associated 
standard errors (SE). We also include the quantile and superquantile values derived from 
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both Y andc/fK) as a means of comparison. From these estimates, and from visual inspection 
of Figure 4.2, we find our method routinely overestimating both quantile and superquantile 
values. Recall, however, the curvature objective function was implemented with the intent 
of achieving conservative predictions. 


Constraints 

Equations 

Continuous, Differentiable, Convex 
Startpoint, Endpoint, Quantiles 

Min. Value, Max. Value 

Lower Bound, Heavy-Tailed 

Equation 2.2, Equation 2.3, Equation 2.1 
Equation 2.6, Equation 2.7, Equation 2.11 
Equation 2.14, Equation 2.13 

Equation 2.9, Equation 2.12 


Table 4.2: Constraint Configuration for Financial DSE Estimation 


Quantiles 

£,(0.80) 

e*(o.90) 

£,(0.95) 

£,(0.99) 

Y 

158.33 

278.49 

419.09 

699.71 

d(Y) 

163.63 

289.41 

430.44 

750.61 

Epi-Spline Est. 

288.89 

414.43 

537.25 

944.34 

Epi-Spline SE. 

69.10 

94.82 

124.42 

321.01 

Superquantiles 

£,(0.80) 

e*(o.90) 

£,(0.95) 

£,(0.99) 

Y 

340.44 

472.81 

604.03 

861.88 

d{Y) 

343.17 

471.17 

598.49 

781.69 

Epi-Spline Est. 

490.27 

633.76 

791.68 

995.23 

Epi-Spline SE. 

110.91 

145.55 

187.39 

246.03 


Table 4.3: Average Epi-Spline Quantile/Superquanile Estimates 


4.3 Density Estimates 

Using the quantile and superquantile estimates previously derived, we again estimate the 
financial PDF under constrained optimization of both MLP and MEP as in Chapter 3. 
Additionally, we apply soft information for all formulations given on Table 4.4. 


4.4 Comparison of Methods 

We compare the results achieved with epi-splines to densities obtained using kernel smooth¬ 
ing on 100 replications, each with a different sample of 30 observations. Sample kernel 
estimates are again obtained using the density function from the stats package in R. For 
kernel smoothing, we enforce mesh endpoints and resolution to match that of the epi-spline 
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Figure 4.2: DSE and Superquantile Estimates for Financial Data 




(a) Dual of Superexpectations (b) Superquantiles 

DSE and superquantile estimates for the financial data samples of size 30. Constrained 
optimization is performed as per Table 4.2. The red line corresponds to the d(Y) kernel 
estimate across the entire data set. We can see the general overestimation performed by the 
objective curvature function on the superquantile plot, leading to conservative predictions. 


Financial Soft Information 
Unity, Continuous, Non-Negative 
Lower/Upper Limits: [-1,000, 1,500] 

Unimodal Inflection Points: x = -250, 250 
Convex Right Tail: x > 250 
Convex Left Tail: x < -250 
Mode: max{/(x)} e [-50,50] 

Minimize Gradient Change: Ab\ < 2e - 06 

Table 4.4: Constraint Formulations by Scenario for Density Estimation 


estimates. This allows for direct comparison of density estimates at each mesh intersec¬ 
tion. The results of these kernel smoothing estimates can be seen in Figure 4.3b, with a 
comparison of estimation errors summarized in Table 4.5. 

With few assumptions made on the data, we can arrive at decent approximations 
that are comparable to those estimates made via naive kernel smoothing. If more soft infor¬ 
mation becomes available, epi-spline estimates can leverage the additional knowledge for 
incrementally better approximations. As an illustrative example of this, we run both meth¬ 
ods over a single sample using an epi-spline MEP enhanced with an accurate knowledge of 
select superquantile values. In this case, our “accurate” superquantile values are those found 
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Figure 4.3: Epi-Spline vs. Kernel Smoothing Density Estimation 




(a) MLP* Epi-Spline Estimates (b) Kernel Smoothing Estimates 

Density estimates for second-order epi-splines under MLP* (left) and kernel smoothing 
(right). The d(Y ) estimate is shown in solid red. 


Estimation Method 

SSE 

SE 

Epi-Spline MLP* 
Epi-Spline MEP* 
Kernel Smoothing 

3.1720e-06 

7.5303e-06 

3.9536e-06 

2.3709e-06 

3.0263e-06 

3.0835e-06 


Table 4.5: PDF Estimation Error for Financial Data 


in Table 4.1 for Y. The results are shown in Figure 4.4 and demonstrate the improvements 
possible with accurate quantile/superquantile estimates, particularly in the tail region. We 
recall that the density approximation of d(Y) in solid black is not the “true” density, and 
that in all likelihood, the much smoother epi-spline estimate seen in the log-scale plots are 
probably better reflections of the true underlying tail density. 
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Figure 4.4: Comparison of PDF Estimation Methods 




(a) Superquantiles Estimated (b) Superquantiles Known 




(c) Qx Estimated (Log-Scale) (d) Qx Known (Log-Scale) 

A comparison of density estimation using MEP optimized epi-splines informed through quan¬ 
tile and superquantile approximations (red), and kernel smoothing (blue). Both methods are 
used on a sample of 30 observations. With minimal knowledge, quantiles and superquantiles 
can be approximated and incorporated with additional soft information. As the fidelity and 
scope of knowledge grows, estimates improve. The left figures use quantile and superquantile 
estimates corresponding to Scenario 1 in DSE estimation. If these estimates were 100% 
accurate, the improved estimates would correspond to the figures on the right. Bottom 
figures are plotted on a log-scale and enhanced to show the right tail region. 
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CHAPTER 5: 

Multi-Fidelity Hydrofoil Data 


We now investigate a multi-fidelity data set that comes to us from Dr. S. Brizzolara of the 
MIT SeaGrant program. The data contains the output of 878 runs of both high-fidelity and 
low-fidelity fluid dynamic models that simulate a particular hydrofoil concept’s drag-to-lift 
ratio. Since computational cost can vary significantly between low-fidelity vs. high-fidelity 
simulation runs, we wish to explore the accuracy of density estimates that rely on only a 
few high-fidelity observations supplemented with many low-fidelity runs. A more detailed 
exploration of multi-fidelity modeling and its applications to fluid dynamics simulations 
can be found in [19]. 


5.1 Hierarchical Model Blending 

In this endeavor, we explore three options for informing the quantile and superquantile 
estimates that ultimately help shape our PDF approximations. In the first, we use only 
a small sample of high-fidelity observations, running a two-stage optimization in much 
the same way as Chapters 3 and 4. Next, we look at the correlated relation between the 
low-fidelity and high-fidelity observations, forming a linear model which we then use to 
predict high-fidelity quantiles and superquantiles from low-fidelity observations. Finally, 
we look at a blended model which uses a combination of the previous two methods. As our 
basis for comparison, we fit a kernel density across 875 of the 878 entries (we omit three 
as outliers) for both the high and low fidelity outputs. We will henceforth loosely refer to 
these as the kernel estimates for high- or low-fidelity data respectively. These distributions 
and their associated sample data are displayed in Figure 5.1. 


5.2 High-Fidelity Modeling 

With the 10 observation sample of high-fidelity simulation runs, we calculate the sample 
statistics and bootstrapped confidence intervals as in Chapter 3. Performing the first opti¬ 
mization using the same constraint formulations provided in Table 4.2, we obtain the results 
shown in Figure 5.2, where we see fairly severe overestimation of superquantiles. 
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Figure 5.1: HydrofoiI Data Inspection 



Drag/Lift Ratio 

Densities of the two hydrofoil data sets formulated via kernel smoothing across 875 obser¬ 
vations. We use 10 high-fidelity (green) and 300 low-fidelity (orange) samples respectively. 
Note the unimodality of the low-fidelity kernel estimate vice the dual-modality of the high- 
fidelity estimate. 


Number of High-Fidelity Samples: /;/, = 10 
Number of Low-Fidelity Samples: ii/ =300 


In addition to quantile predictions for upper tail regions, DSE functions can (as per 
Equation 1.6) also be used for quantile predictions in the lower tail regions, at least in theory. 
Although the fidelity of this approach remains unclear, we will attempt it here so as to obtain 
quantile estimates for regions where they cannot be reasonably attained from the sample 
itself. In this way, we estimate additional quantile values for p = 0.05, 0.10, and 0.20, 
including them as bounded quantile constraints within the density estimation formulation. 

Using the estimates attained, we approximate density functions using the following 
combination of sample statistics, predicted quantiles/superquantiles, and additional soft 
information shown in Table 5.1. We elect here to optimize on the basis of log-likelihood 
rather than entropy due to the small sample size so as to provide further emphasis for 
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Figure 5.2: High-Fidelity Quantile Estimation 



(a) Dual of Superexpectations (b) Superquantiles 

Results of quantile estimation on the high-fidelity sample data. Notice the log-scale used 
to display DSE (left). Although the DSE appears to be well-estimated, we see from the 
superquantiles (right) that we are overestimating virtually all superquantiles beyond the mean. 


density around the high-fidelity observations. Results are shown with and without the 
implementation of the quantile and superquantile constraints in Figure 5.3. 


HSV Soft Information 
Unity, Continuous, Non-Negative 
Lower/Upper Limits: [0.083, 0.088] 

Unimodal Inflection Points: x = 0.0825, 0.0845 
Convex Right Tail: x > 0.0857 
Minimize Gradient Change: A b\ < 10 e 6 

Table 5.1: Constraint Formulations by Scenario for Density Estimation 


We see from Figure 5.3 that although constrained optimization of log-likelihood 
can help identify the non-parametric dual-modality of the high-fidelity density, poor quan¬ 
tile and superquantile estimates result in tail weights that actually prove detrimental to 
overall approximations. Since quantile estimates for low/high /^-values were underes¬ 
timated/overestimated due to the nature of our DSE optimization objective, the resulting 
constraints for the MLP formulation ended up pushing too much density into the tail regions. 
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Figure 5.3: PDF Estimates Using FHigh-Fidelity Sample Only 




Results of PDF estimation on the high-fidelity sample data. Using only soft information 
and log-likelihood (left) we see a failure to properly estimate the tail regions for which no 
observations are present. Adding additional constraints, however, causes overestimation of 
the quantiles and superquantiles leads to an “over-filling” of the tail regions. 


5.3 Low-Fidelity Surrogate Modeling 

Estimates relying on only 10 observations can lead to predictions that prove either uninfor¬ 
mative or perhaps even counterproductive. As such, we attempt to leverage the plethora of 
low-fidelity observations by relying on the underlying correlation between data sets. We 
begin by fitting a linear model and assessing the quality of fit for the 10 observations from 
the high-fidelity sample and their associated low-fidelity values. A scatter plot showing the 
general correlations, as well as the 10 points used for the regression model are provided in 
Figure 5.4. As such, the resulting least squares linear model becomes 


Xh ~ Po + Pi X[ + 6 , 

Po — 8.702 x 10 ^, (5.1) 

Pi = 0.937. 

Provided a reasonable fit of the linear model (here we have R 2 ~ 0.95) and the fact 


46 






Figure 5.4: PDF Estimates Using Low-Fidelity Linear Approximations 



Low-Fidelity 

(a) High/Low-Fidelity Relation 



Low-Fidelity 

(b) Multi-Fidelity Linear Model 


Scatter plot (left) of high-to-low fidelity observations for all 875 observations and the linear 
model for predicting high-fidelity output from low-fidelity input values (right). The high- 
fidelity sample data (green) is common to both plots. 


that linear transformations preserve shape and distribution, we can translate mean, variance, 
and quantile values for the low-fidelity sample data into intervals for the high-fidelity set by 


X h * 

s A> + Xi, 

(5.2) 

VarfA/J s 

. /? T Var(A/), 

(5.3) 

Qx h ; 

- A> + P\Qx r 

(5.4) 


Additionally, we can estimate superquantile values from the transformed low-fidelity 
data itself, and by DSE approximation. For superquantiles estimated from the transformed 
low-fidelity data, we can further generate confidence intervals using bootstrapping if desired, 
though we will not attempt this here. Due to the larger sample size of 300, we can more 
confidently predict quantile/superquantile values for p closer to 0 and 1. In fact, because 
sample estimates tend to underestimate both quantile and superquantile values (due to lack 
of tail representation), and the tendency for these same estimates to be overestimated in 
DSE approximation (as seen earlier), we propose a weighted averaging of these predictions 
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to arrive at more accurate values. In this way, our estimates for quantiles and superquantiles 
are calculated according to 


Qavg(p) = OQsip) + (1 - 9)Q e (p), (5.5) 

where Q avg represents the weighted averaged of the sample quantile value (Q s ) and the epi- 
spline estimated quantile value ( Q e ). 6 provides the ratio of sample to epi-spline estimate 
weighting. Here, we will use 9 = 0.6. The results of this process are seen in Figure 5.5, 
where we note the greatly improved estimates obtained using a 60/40 weighted average. 

Figure 5.5: Quantile & Superquantile Estimation Comparisons 




(a) Quantile Estimation Accuracy (b) Superquantile Estimation Accuracy 

Comparison of quantile and superquantile estimation techniques. Recall that the constraints 
informed via DSE estimation (red) were consistent over-estimators, while those informed 
only by sample statistics (blue) were consistent under-estimators. Here, the averaged values 
(green) provide the most accurate quantile and superquantile estimates. 

Thus, we use the 300 low-fidelity observations to inform distributional characteris¬ 
tics of the high-fidelity data, such as mean, variance, quantiles, and superquantiles. Then, 
using these surrogate statistics, we impose the same formulation as before, but now with 
log-likelihood on the basis of the transformed low-fidelity observations. Results are seen 
in Figure 5.6. We note that by arriving at more accurate quantile and superquantile esti¬ 
mates, our tail densities have improved considerably. Despite this, because we rely solely 
on low-fidelity observations, we do not achieve the dual-modality we saw earlier with the 
high-fidelity formulations. 
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Figure 5.6: Density Using Low-Fidelity Surrogate Constraints 



Density estimation results when using the low-fidelity observations. Estimates are trans¬ 
formed via a linear relationship determined by the correlations of the 10 observations from 
the high-fidelity sample. Quantile and superquantile estimates are further estimated by a 
weighted average of sample statistics and DSE approximations. Optimization is done via a 
MLP for the transformed low-fidelity samples. Results show greatly improved tail densities 
when compared to using high-fidelity observations alone. 

5.4 High and Low Mixture Modeling 

As a third and final method, we intend to combine the strengths of the previous two 
techniques so as to achieve both the non-parametric dual-modality of the high-fidelity 
model, as well as the improved tail densities observed in the low-fidelity surrogate model. 
To do this, we retain our quantile and superquantile averaged estimates from the low-fidelity 
sample, and implement them in a MLP formulation on the high-fidelity observations. All 
other soft information remains consistent with previous iterations. The results of this 
blended model are displayed in Figure 5.7. 

The success of this method remains highly dependent on the accuracy of the linear 
model developed in section 5.3. In this case, the 10 high-fidelity data points used in 
model creation accurately reflected the preponderance of the high-to-low fidelity diagonal 
relationship seen in Figure 5.4. With only 10 observations, this will often not be the case, 
and in such circumstances, this method may actually be counterproductive. Thus, we point 
out not the validity of this method in all circumstances, but rather its potential uses provided 
a relatively high confidence in the high-to-low fidelity correlations. 
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Figure 5.7: PDF Estimates Using FHigFi/Low-Fidelity Mixture 




The results of constrained optimization of log-likelihood on the high-fidelity observations 
using a quantile and superquantile constraints derived from averaged low-fidelity sample and 
DSE estimates. Notice that we have achieved the dual-modality desired, as well as much 
more accurate tail density estimates. When plotted on a log-scale (bottom), we can see the 
points corresponding to where our quantile/superquantile constraints take effect. 
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CHAPTER 6: 
Conclusions 


We have introduced a new method for improving density estimates for data coming from 
known or suspected heavy tailed distributions. By leveraging the dual relationship between 
superquantiles and superexpectations, we incorporate an additional set of constraints within 
a density optimization framework in a manner that, to the best of this author’s knowledge, 
had not yet been attempted. By incorporating all potential sources of available information, 
we are able to provide tailored formulations for density approximations that enhance the 
fidelity of estimates for even small sample sizes. 

The work presented here represents an initial excursion into epi-spline density 
approximations utilizing quantile and superquantile estimates. We demonstrate and quantify 
the value of accurate quantile and superquantile estimates in two benchmark examples that 
cover exponential tails and a heavy-tailed Pareto distribution. The financial scenario applies 
the method to a non-parametric data set where we compare our method against an existing 
density estimation technique and show that even gross-level quantile approximations can 
improve tail density estimates when properly bounded. Finally, the hydrofoil example 
demonstrates the feasibility of informing quantile and superquantile constraints for densities 
of limited observations through a linear approximation of more abundant data on the basis 
of reasonable correlation. 

Although initial results show promise, they represent only a limited application of 
quantile and superquantile constraints within constrained epi-spline optimizations. Much 
work is left to be done, particularly in the realm of superquantile estimation itself. Second- 
order epi-splines, though flexible and relatively easy to formulate, offer limited degrees 
of freedom that can limit constraint combinations, a shortcoming that might be overcome 
through higher order epi-splines, such as those of the third or fourth order. Additionally, the 
choice of objective function in estimating the DSE arose through empirical observation on 
limited distributional examples. Better objective functions likely exist that can encourage 
the smooth and convex shapes seen in the DSE plots of Chapters 3-5. 

Right tail density estimation remained the focus of this work, though we recognize 
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there are numerous applications in which both the left and right tails are of significance. 
Although left unexplored, we expect that the addition of a second iteration of superquantile 
estimation prior to density approximation could quite reasonably achieve this goal by simply 
applying the optimizations of section 2.3 on the negatives of observed values. 

Quantification of density estimation accuracy remains an issue throughout this 
thesis. In cases where the underlying density is unknown (such as Chapters 4 and 5), 
we rely on other density approximations to reflect the “true” distribution. In some cases, 
these approximations inaccurately portray what we intuitively believe to be the true tail 
densities, as in the example of section 4.3. Additionally, because density values can range 
so dramatically between data sets, numerical errors should be scaled to properly reflect the 
relative, rather than absolute, impact of constraint implementation. Take, for instance, the 
extreme difference in density values between the data sets of Chapters 4 and 5. In light of 
these factors, we remain quite reliant on a qualitative evaluation of density approximation 
through the use of visual comparisons. 

The choice between an MLP and MEP formulation for density estimation remains 
an ambiguous issue with no clear answer. Given the objective of each, one can reasonably 
conclude that MLP formulations will likely underestimate tail density due to a potential 
lack of observations from those regions, thus recommending quantile and superquantile 
constraint formulations that utilize lower bounds on tail densities in order to “push” weight 
into the tail regions. Likewise, one can envision using these same constraints as an upper 
bound within an MEP framework in order to “rein in” the tails. Although this objec¬ 
tive/constraint relationship generally performs as intended, it is important to recognize an 
additional aspect distinguishing the two objective functions. Lor small samples with little 
additional soft information, we often cannot foresee issues of modality, non-symmetry, 
and other non-parametric characteristics. In these cases, because MEP formulations rely 
solely on constraints rather than the sample data itself, these distributional shape attributes 
can be overlooked. As in the hydrofoil example of Chapter 5, an MEP formulation would 
have failed to identify the dual-modality of the underlying distribution barring the explicit 
incorporation of a constraint requiring it. 

With the ability to incorporate a flexible and adaptive method of constraint formu¬ 
lation, epi-spline optimizations enable analysts to uniquely modify the process proposed 
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here to their particular application of interest, and based on their total available knowledge. 
Given the limited data sets explored, we find that incorporation of accurate quantile and 
superquantile estimates can significantly enhance density estimation, particularly for those 
distributions with heavy tails. The key in this implementation remains in the accuracy of 
the quantile/superquantile estimates themselves, which, through more evolved constrained 
optimization can be enhanced without the need for additional samples. We expect further 
exploration of the methods and processes posed here to engender epi-spline optimization 
for more extensive usage. 
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APPENDIX: Computations 


A.l Software Interface Algorithm 

We utilize the GDXRRW utility suite [20] to provide an interface between R and the General 
Algebraic Modeling System (GAMS). We begin by initializing simulation parameters and 
indexes such as sample size, iteration number, and parameter values (i.e., A, a). Passing 
these inputs from GAMS to R, we generate the random samples, compute sample statistics, 
and pass results back to GAMS for DSE and density optimizations. The results of these 
optimizations are then returned to R for plotting and error calculation. The general algorithm 
is as follows: 

Algorithm 1: DSE and PDF Optimization 

Results: Qxip), Q x(p), and f x Estimation Errors 
Initialize Sets, Parameters 
foreach Trial do 

Generate X sample data for specified distribution; 

Calculate sample statistics; 

Input soft information for f x per DSE scenario; 

Solve for DSE using Maximum Curvature 

Record a -coefficients and Qx(p), Qxip) predictions; 

Plot and record A AD, MAD errors; 

Apply quantile, superquantile constraints; 

Input soft information for f x ; 
if Maximum Log-Likelihood then 
Solve for f x using MLP; 

else 

Solve for f x using MEP; 

end 

Record ^-coefficients; 

Plot and record SSE, SSTE errors; 

end 
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A.2 Computational Time 

We performed all statistical and optimization calculations on a 2010 PC laptop with a 
2.53 GHZ Intel Core i5 processor and 4GB RAM. Statistical computations and error 
calculations were performed using R, while optimization were performed using GAMS. 
DSE optimization was done using the CONOPT solver, with 100 iterations taking less than 
4 minutes to perform 100 iterations in all scenarios. PDF optimization was performed 
using either the CONOPT or COINIPOPT solver depending on the complexity of constraint 
configurations. Numerical experimentation using 100 iterations as in Chapter 3 took less 
than 10 minutes in all cases for optimizations, with less than 5 minutes required in most 
cases. 


A.3 Bootstrapped Confidence Intervals 

From our original sample of X of size n , we perform B = 1,000 bootstrap iterations in 
which we sample from X (with replacement) so that each bootstrapped sample is also of 
size n. Calculating statistics of interest on these bootstrapped sample provides 1,000 values 
per statistic of interest. Sorting these statistics allows us to identify the corresponding 25th 
and 975th largest statistics, s{x* b ), of the distribution of B. These values then correspond 
to an approximate 95% confidence interval for the statistic s of the original sample X. 

Algorithm 2: Bootstrapping Sample Statistics 
Result: 95% Cl for s{x) 

Generate X sample data for specified distribution; 
foreach Iteration in 1 : B = 1,000 do 

Generate bootstrap sample: x* b = SamplcfA. n, replace = T); 

Calculate statistic: s(x* h ) 

Sort bootstrapped statistics 
return 250th and 975th largest s(x * b ) 

This relatively straightforward method of bootstrapping is called quantile-based 
intervals , and though simple, is often affected by bias and tends to provide less than the 
nominal coverage. For a more in-depth explanation of bootstrapping, see [16]. 
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A.4 Non-Parametric Binomial Confidence Intervals 


The use of binomial confidence intervals for quantile estimation is well-known. For a more 
in-depth explanation on the validity of this technique, see [21]. Here, we use the binomial 
approximation for estimating the median and quartile values of small samples, using a 95% 
confidence interval, [X,., X v ] where integer values r and s are the indexes of the sorted X 
determined as 


r = largest r s.t. ^ |. jp ! (l - p) n 1 < 0.025, 

5 = smallest 5 s.t. ^°- 975 - 


i =1 


A.5 Quantile and Superquantile Constraint Calculation 

Provided their central importance in constraint formulations, details regarding both quantile 
and superquantile constraint derivations are provided. 


Quantile: / 

Q(p) 


f(x)dx = 1 — p 


Superquantile 


CXJ 

"±1 

Q(p) 


xf(x)dx = Qx(p) 


Using Simpson’s Rule, which provides exact approximations for third order and below 
polynomials, we can simplify the integrals as follows. 


OO D 

Qx(p) ■ J f(x)dx = YjJ f^ dx ~ X 

Q(p) a ’ b a a ' b 


b - a 


f (a) + f(b) + 4/ 


a + b 


a 

Ax 

~6 


= Z 

m 

= 2 > 


(b% + b™x™) + (b'l; + b™x™) + 41 b™ + b™ 

b™ + v* 


Y m , Y m 
X L ^ X R 


Y tn , Y m \ n 
X L ^ X R 
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For the quantile constraints we sum over all m such that x'£ is greater than Q(p). This allows 
us to achieve an arbitrarily close approximation to the true integrals as our mesh resolution 
approaches zero. 
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A.6 Chapters 3 Additional Results 


Table A.l: SSE and SSTE for Exponential Benchmark 


Scenario 

SSE 

SSTE (0.80) 

SSTE (0.90) 

SSTE (0.95) 

PDF la 

1.272e-02 

2.117e-03 

1.114e-03 

5.434e-04 

PDF la* 

1.180e-02 

1.524e-03 

8.362e-04 

4.587e-04 

PDF 2a 

1.226e-02 

2.037e-03 

1.098e-03 

5.480e-04 

PDF 2a* 

1.207e-02 

1.662e-03 

1.156e-03 

8.138e-04 

PDF 3a 

7.179e-03 

1.129e-03 

5.155e-04 

3.032e-04 

PDF 3a* 

6.849e-03 

8.079e-04 

4.785e-04 

3.169e-04 

PDF 4a 

6.751e-03 

1.050e-03 

4.839e-04 

2.854e-04 

PDF 4a* 

6.403e-03 

8.063e-04 

4.692e-04 

2.703e-04 

PDF lb 

2.832e-02 

2.943e-03 

2.661e-03 

2.057e-03 

PDF lb* 

7.924e-03 

5.520e-04 

2.460e-04 

1.714e-04 

PDF 2b 

4.318e-03 

9.375e-04 

7.304e-04 

6.809e-04 

PDF 2b* 

2.540e-03 

5.610e-04 

4.077e-04 

2.388e-04 

PDF 3b 

2.056e-03 

1.100e-03 

2.623e-04 

2.250e-04 

PDF 3b* 

1.480e-03 

4.209e-04 

1.312e-04 

9.259e-05 

PDF 4b 

2.578e-05 

< 1.0e-08 

< 1.0e-10 

< 1.0e-10 

PDF 4b* 

2.087e-05 

2.070e-05 

3.697e-06 

4.462e-07 


The sum of squared errors (SSE) and tail errors (SSTE) of first-order epi-spline estimates 
against the true exponential distribution evaluated at each mesh intersection. Notice the impact 
of both increased distributional knowledge and the effect of accurate quantile/superquantile 
estimates. Here the tails correspond to prediction values to the right of particular exponential 
quantiles according to 


2(0.80) = 8.05 
2(0.90) = 11.51 
2(0.95) = 14.98 


59 




Table A.2: SSE and SSTE for Pareto Benchmark 


Scenario 

SSE 

SSTE (0.80) 

SSTE (0.90) 

SSTE (0.95) 

PDF la 

2.532e-02 

1.858e-03 

6.454e-04 

1.850e-04 

PDF la* 

2.514e-02 

1.391e-03 

5.196e-04 

2.164e-04 

PDF 2a 

2.648e-02 

1.914e-03 

6.390e-04 

1.878e-04 

PDF 2a* 

2.570e-02 

1.599e-03 

6.070e-04 

2.041e-04 

PDF 3a 

1.474e-02 

9.646e-04 

3.452e-04 

1.241e-04 

PDF 3a* 

1.469e-02 

8.039e-04 

3.601e-04 

1.305e-04 

PDF 4a 

1.460e-02 

9.121e-04 

3.274e-04 

1.163e-04 

PDF 4a* 

1.406e-02 

6.856e-04 

3.486e-04 

1.284e-04 

PDF lb 

1.830e-02 

8.479e-04 

3.381e-04 

8.028e-05 

PDF lb* 

1.675e-02 

7.580e-04 

2.917e-04 

7.802e-05 

PDF 2b 

9.742e-03 

4.146e-04 

2.123e-04 

8.107e-05 

PDF 2b* 

9.455e-03 

4.896e-04 

2.398e-04 

6.041e-05 

PDF 3b 

1.739e-03 

3.094e-04 

9.100e-05 

4.425e-05 

PDF 3b* 

1.991e-03 

2.629e-04 

9.746e-05 

1.719e-05 

PDF 4b 

1.529e-03 

1.352e-04 

6.548e-05 

1.328e-05 

PDF 4b* 

1.642e-03 

1.566e-04 

6.632e-05 

7.796e-06 


The sum of squared errors (SSE) and tail errors (SSTE) of first-order epi-spline estimates 
against the true Pareto distribution evaluated at each mesh intersection. Here the tails corre¬ 
spond to prediction values to the right of particular Pareto quantiles according to 

2 ( 0 . 80 ) = 7.10 
2 ( 0 . 90 ) = 11.51 
2 ( 0 . 95 ) = 17.14 


60 




Figure A.l: DSE and Superquantile Estimates for Exponential Scenarios 1 and 3 




(a) Scenario 1 DSE Estimates 


(b) Scenario 1 Superquantile Estimates 



0.0 0.2 0.4 0.6 0.8 

P 



(c) Scenario 3 DSE Estimates 


(d) Scenario 3 Superquantile Estimates 


Dual of superexpectations (left) and superquantile (right) second-order epi-spline estimates 
for the exponential case using constraints via Scenario 1 (top) and Scenario 3 (bottom) 
and optimized for curvature. The true values are shown in solid red. Notice the generally 
conservative estimates made, regularly overestimating the superexpectations. 
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Figure A.2: PDF Estimates for Exponential Scenarios lb and 4b 




(a) Scenario lb PDF (b) Scenario la PDF Log-Scale 




(c) Scenario 4b PDF (d) Scenario 4b PDF Log-Scale 

Density estimates (left), also shown on a log-scale (right), for first-order epi-spline estimates 
for Scenarios lb (top) and 4b (bottom) of the Exponential benchmark. The true values are 
shown in solid red. Increased knowledge between scenarios results in more accurate density 
estimates. For Scenario 4b, results are so accurate that epi-spline estimates are no longer 
visible. 
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Figure A.3: DSE and Superquantile Estimates for Pareto Scenarios 1 and 3 



0.0 0.2 0.4 0.6 0.8 1.0 

P 

(a) Scenario 1 DSE Estimates 



(c) Scenario 3 DSE Estimates 



(b) Scenario 1 Superquantile Estimates 



(d) Scenario 3 Superquantile Estimates 
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Dual of superexpectations (left) and superquantile (right) second-order epi-spline estimates for 
the Pareto case using constraints via Scenario 1 (top) and Scenario 3 (bottom) and optimized 
for curvature. The true values are shown in solid red. 










Figure A.4: PDF Estimates for Pareto Scenarios lb and 4b 




(a) Scenario lb PDF 


(b) Scenario la PDF Log-Scale 




(c) Scenario 4b PDF 


(d) Scenario 4b PDF Log-Scale 


Density estimates (left), also shown on a log-scale (right), for first-order epi-spline estimates 
for Scenarios lb (top) and 4b (bottom) of the Pareto benchmark. The true values are shown 
in solid red. 
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